7 research outputs found
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders
Convolutional autoencoders have emerged as popular methods for unsupervised
defect segmentation on image data. Most commonly, this task is performed by
thresholding a pixel-wise reconstruction error based on an distance.
This procedure, however, leads to large residuals whenever the reconstruction
encompasses slight localization inaccuracies around edges. It also fails to
reveal defective regions that have been visually altered when intensity values
stay roughly consistent. We show that these problems prevent these approaches
from being applied to complex real-world scenarios and that it cannot be easily
avoided by employing more elaborate architectures such as variational or
feature matching autoencoders. We propose to use a perceptual loss function
based on structural similarity which examines inter-dependencies between local
image regions, taking into account luminance, contrast and structural
information, instead of simply comparing single pixel values. It achieves
significant performance gains on a challenging real-world dataset of
nanofibrous materials and a novel dataset of two woven fabrics over the state
of the art approaches for unsupervised defect segmentation that use pixel-wise
reconstruction error metrics
Complex-Valued Autoencoders for Object Discovery
Object-centric representations form the basis of human perception and enable
us to reason about the world and to systematically generalize to new settings.
Currently, most machine learning work on unsupervised object discovery focuses
on slot-based approaches, which explicitly separate the latent representations
of individual objects. While the result is easily interpretable, it usually
requires the design of involved architectures. In contrast to this, we propose
a distributed approach to object-centric representations: the Complex
AutoEncoder. Following a coding scheme theorized to underlie object
representations in biological neurons, its complex-valued activations represent
two messages: their magnitudes express the presence of a feature, while the
relative phase differences between neurons express which features should be
bound together to create joint object representations. We show that this simple
and efficient approach achieves better reconstruction performance than an
equivalent real-valued autoencoder on simple multi-object datasets.
Additionally, we show that it achieves competitive unsupervised object
discovery performance to a SlotAttention model on two datasets, and manages to
disentangle objects in a third dataset where SlotAttention fails - all while
being 7-70 times faster to train
Rotating Features for Object Discovery
The binding problem in human cognition, concerning how the brain represents
and connects objects within a fixed network of neural connections, remains a
subject of intense debate. Most machine learning efforts addressing this issue
in an unsupervised setting have focused on slot-based methods, which may be
limiting due to their discrete nature and difficulty to express uncertainty.
Recently, the Complex AutoEncoder was proposed as an alternative that learns
continuous and distributed object-centric representations. However, it is only
applicable to simple toy data. In this paper, we present Rotating Features, a
generalization of complex-valued features to higher dimensions, and a new
evaluation procedure for extracting objects from distributed representations.
Additionally, we show the applicability of our approach to pre-trained
features. Together, these advancements enable us to scale distributed
object-centric representations from simple toy to real-world data. We believe
this work advances a new paradigm for addressing the binding problem in machine
learning and has the potential to inspire further innovation in the field
Rotating features for object discovery
The binding problem in human cognition, concerning how the brain represents and connects objects within a fixed network of neural connections, remains a subject of intense debate. Most machine learning efforts addressing this issue in an unsupervised setting have focused on slot-based methods, which may be limiting due to their discrete nature and difficulty to express uncertainty. Recently, the Complex AutoEncoder was proposed as an alternative that learns continuous and distributed object-centric representations. However, it is only applicable to simple toy data. In this paper, we present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations. Additionally, we show the applicability of our approach to pre-trained features. Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data. We believe this work advances a new paradigm for addressing the binding problem in machine learning and has the potential to inspire further innovation in the field
CITRIS: Causal Identifiability from Temporal Intervened Sequences
Understanding the latent causal factors of a dynamical system from visual
observations is considered a crucial step towards agents reasoning in complex
environments. In this paper, we propose CITRIS, a variational autoencoder
framework that learns causal representations from temporal sequences of images
in which underlying causal factors have possibly been intervened upon. In
contrast to the recent literature, CITRIS exploits temporality and observing
intervention targets to identify scalar and multidimensional causal factors,
such as 3D rotation angles. Furthermore, by introducing a normalizing flow,
CITRIS can be easily extended to leverage and disentangle representations
obtained by already pretrained autoencoders. Extending previous results on
scalar causal factors, we prove identifiability in a more general setting, in
which only some components of a causal factor are affected by interventions. In
experiments on 3D rendered image sequences, CITRIS outperforms previous methods
on recovering the underlying causal variables. Moreover, using pretrained
autoencoders, CITRIS can even generalize to unseen instantiations of causal
factors, opening future research areas in sim-to-real generalization for causal
representation learning.Comment: Accepted at the International Conference on Machine Learning (ICML),
202